代价

下面是我对 S 型函数的实现过程：

class Sigmoid(Node):
    def __init__(self, node):
        Node.__init__(self, [node])

    def _sigmoid(self, x):
        """
        This method is separate from `forward` because it
        will be used with `backward` as well.

        `x`: A numpy array-like object.
        """
        return 1. / (1. + np.exp(-x)) # the `.` ensures that `1` is a float

    def forward(self):
        input_value = self.inbound_nodes[0].value
        self.value = self._sigmoid(input_value)

你可能觉得奇怪，为何 _sigmoid 具有单独的方法。正如在 S 型函数（等式 (4)）的导数中看到的，S 型函数实际上是它自己的导数的一部分。将 _sigmoid 分离出来意味着你不需要为前向传播和反向传播实现两次。

这很不错！此时，你已经使用了权重和偏置来计算输出。并且你使用了激活函数来对输出进行分类。你可能还记得，神经网络通过修改权重和偏置（根据标签化的数据集进行训练）改善输出的精确度。

我们可以采用多种技巧来定义神经网络的精确度，所有技巧围绕的都是神经网络是否能够生成与已知正确的值非常接近的值。人们用不同的名称来表示这一精确度测量者，通常称之为损失或代价。我将经常使用代价一词。

对于本测验，你将使用均方差 (MSE) 计算代价。如下所示：

C(w, b) = \frac{1}{m}\sum_x || y(x) - a || ^2

等式 (5)

此处，w 表示网络中所有的权重集合，b 表示所有的偏置，m 表示训练示例的总数，a 是 y(x) 的近视值，a 和 y(x) 都是长度相同的向量。

权重集合是所有权重矩阵压平成的向量，串联成一个大的向量。偏置也相似，但是它们已经是向量，所以在串联前不需要压平。

以下是创建 w 的代码示例：

# 2 by 2 matrices
w1  = np.array([[1, 2], [3, 4]])
w2  = np.array([[5, 6], [7, 8]])

# flatten
w1_flat = np.reshape(w1, -1)
w2_flat = np.reshape(w2, -1)

w = np.concatenate((w1_flat, w2_flat))
# array([1, 2, 3, 4, 5, 6, 7, 8])

这样可以轻松地将神经网络中使用的所有权重和偏置提取出来，从而更轻松地编写代码，我们将在接下来的梯度下降部分看到。

注意：你不需要在你的代码中实现！只是将权重和偏置看做集合比单独对待更容易处理。

代价 C 取决于正确输出 y(x) 和网络的输出 a 之间的差值。很容易看出 y(x) 和 a (对于 x 的所有值）) 之间的差始终不为 0。

这是理想情况，实际上学习流程就是为了尽量减小代价。

现在请你来计算代价。

你在上道测验的前向中实现了这一网络。

现在可以看出它输出的是无用数据。S 型节点的激活什么也没表示，因为该网络没有可以与之对比的带标签输出。此外，权重和偏置无法更改，没有代价的话，无法进行学习。

说明

对于这道测验，你将对 nn.py 中的网络运行前向传递。请完成 MSE 方法的实现，使其能够根据上述等式计算代价。

建议使用 np.square (文档) 方法，这样代码编写起来更轻松。

查看 nn.py，看看 MSE 将如何计算代价。
打开 miniflow.py. 完成 MSE 的构建。
测试你的网络！通过 nn.py 并根据输入判断代价是否合理。

Start Quiz:

nn.py miniflow.py

"""
Test your MSE method with this script!

No changes necessary, but feel free to play
with this script to test your network.
"""

import numpy as np
from miniflow import *

y, a = Input(), Input()
cost = MSE(y, a)

y_ = np.array([1, 2, 3])
a_ = np.array([4.5, 5, 10])

feed_dict = {y: y_, a: a_}
graph = topological_sort(feed_dict)
# forward pass
forward_pass(graph)

"""
Expected output

23.4166666667
"""
print(cost.value)

import numpy as np


class Node(object):
    """
    Base class for nodes in the network.

    Arguments:

        `inbound_nodes`: A list of nodes with edges into this node.
    """
    def __init__(self, inbound_nodes=[]):
        """
        Node's constructor (runs when the object is instantiated). Sets
        properties that all nodes need.
        """
        # A list of nodes with edges into this node.
        self.inbound_nodes = inbound_nodes
        # The eventual value of this node. Set by running
        # the forward() method.
        self.value = None
        # A list of nodes that this node outputs to.
        self.outbound_nodes = []
        # Sets this node as an outbound node for all of
        # this node's inputs.
        for node in inbound_nodes:
            node.outbound_nodes.append(self)

    def forward(self):
        """
        Every node that uses this class as a base class will
        need to define its own `forward` method.
        """
        raise NotImplementedError


class Input(Node):
    """
    A generic input into the network.
    """
    def __init__(self):
        # The base class constructor has to run to set all
        # the properties here.
        #
        # The most important property on an Input is value.
        # self.value is set during `topological_sort` later.
        Node.__init__(self)

    def forward(self):
        # Do nothing because nothing is calculated.
        pass


class Linear(Node):
    """
    Represents a node that performs a linear transform.
    """
    def __init__(self, X, W, b):
        # The base class (Node) constructor. Weights and bias
        # are treated like inbound nodes.
        Node.__init__(self, [X, W, b])

    def forward(self):
        """
        Performs the math behind a linear transform.
        """
        X = self.inbound_nodes[0].value
        W = self.inbound_nodes[1].value
        b = self.inbound_nodes[2].value
        self.value = np.dot(X, W) + b


class Sigmoid(Node):
    """
    Represents a node that performs the sigmoid activation function.
    """
    def __init__(self, node):
        # The base class constructor.
        Node.__init__(self, [node])

    def _sigmoid(self, x):
        """
        This method is separate from `forward` because it
        will be used with `backward` as well.

        `x`: A numpy array-like object.
        """
        return 1. / (1. + np.exp(-x))

    def forward(self):
        """
        Perform the sigmoid function and set the value.
        """
        input_value = self.inbound_nodes[0].value
        self.value = self._sigmoid(input_value)


class MSE(Node):
    def __init__(self, y, a):
        """
        The mean squared error cost function.
        Should be used as the last node for a network.
        """
        # Call the base class' constructor.
        Node.__init__(self, [y, a])

    def forward(self):
        """
        Calculates the mean squared error.
        """
        # NOTE: We reshape these to avoid possible matrix/vector broadcast
        # errors.
        #
        # For example, if we subtract an array of shape (3,) from an array of shape
        # (3,1) we get an array of shape(3,3) as the result when we want
        # an array of shape (3,1) instead.
        #
        # Making both arrays (3,1) insures the result is (3,1) and does
        # an elementwise subtraction as expected.
        y = self.inbound_nodes[0].value.reshape(-1, 1)
        a = self.inbound_nodes[1].value.reshape(-1, 1)
        # TODO: your code here
        pass


def topological_sort(feed_dict):
    """
    Sort the nodes in topological order using Kahn's Algorithm.

    `feed_dict`: A dictionary where the key is a `Input` Node and the value is the respective value feed to that Node.

    Returns a list of sorted nodes.
    """

    input_nodes = [n for n in feed_dict.keys()]

    G = {}
    nodes = [n for n in input_nodes]
    while len(nodes) > 0:
        n = nodes.pop(0)
        if n not in G:
            G[n] = {'in': set(), 'out': set()}
        for m in n.outbound_nodes:
            if m not in G:
                G[m] = {'in': set(), 'out': set()}
            G[n]['out'].add(m)
            G[m]['in'].add(n)
            nodes.append(m)

    L = []
    S = set(input_nodes)
    while len(S) > 0:
        n = S.pop()

        if isinstance(n, Input):
            n.value = feed_dict[n]

        L.append(n)
        for m in n.outbound_nodes:
            G[n]['out'].remove(m)
            G[m]['in'].remove(n)
            # if no other incoming edges add to S
            if len(G[m]['in']) == 0:
                S.add(m)
    return L


def forward_pass(graph):
    """
    Performs a forward pass through a list of sorted Nodes.

    Arguments:

        `graph`: The result of calling `topological_sort`.
    """
    # Forward pass
    for n in graph:
        n.forward()

Next Concept